Fault-Tolerant Execution on COTS Multi-core Processors with Hardware Transactional Memory Support

نویسندگان

  • Florian Haas
  • Sebastian Weis
  • Theo Ungerer
  • Gilles Pokam
  • Youfeng Wu
چکیده

The demand for fault-tolerant execution on high performance computer systems increases due to higher fault rates resulting from smaller structure sizes. As an alternative to hardware-based lockstep solutions, software-based fault-tolerance mechanisms can increase the reliability of multi-core commercial-of-the-shelf (COTS) CPUs while being cheaper and more flexible. This paper proposes a software/hardware hybrid approach, which targets Intel’s current x86 multi-core platforms of the Core and Xeon family. We leverage hardware transactional memory (Intel TSX) to support implicit checkpoint creation and fast rollback. Redundant execution of processes and signature-based comparison of their computations provides error detection, and transactional wrapping enables error recovery. Existing applications are enhanced towards fault-tolerant redundant execution by post-link binary instrumentation. Hardware enhancements to further increase the applicability of the approach are proposed and evaluated with SPEC CPU 2006 benchmarks. The resulting performance overhead is 47 % on average, assuming the existence of the proposed

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fault Observant Real-Time Embedded Design for Network-on-Chip Control Systems

Performance and time to market requirements cause many realtime designers to consider components, off the shelf (COTS) for real-time systems. Massive multi-core embedded processors with network-on-chip (NoC) designs to facilitate core-to-core communication are becoming common in COTS. These architectures benefit real-time scheduling, but they also pose predictability challenges. In this work, w...

متن کامل

FaulTM-multi: Fault Tolerance for Multithreaded Applications Running on Transactional Memory Hardware

Fault-tolerance has become an essential concern for processor designers due to increasing transient and permanent fault rates. Executing instruction streams redundantly in chip multi processors (CMP) provides high reliability since it can detect both transient and permanent faults and silent data corruptions. However, comparing the results of the instruction streams, checkpointing the entire sy...

متن کامل

Designing High Performance Computing Architectures for Reliable Space Applications

Future space applications will demand for computing architectures with high performance capabilities. In order to improve the on-board computing power, Commercial Off The Shelf (COTS) multiand many-core processor technologies have to be introduced in the design process of spacecraft computing platforms. Such technologies will be able to reduce the performance gap between on-board and ground com...

متن کامل

Predictable transactional memory architecture for hierarchical mixed-criticality systems

A transactional memory simplifies the concurrency management in multicore systems by permitting sets of load and store instructions to be executed in an atomic way. The correct results for concurrent transactions and the execution time strongly depend on the coherency potentials, rollback capabilities and strategies of the transactional memory. A transactional memory can be implemented as a Har...

متن کامل

— Syllabus — Software Transactional Memory

“Transactional memory” means that, instead of using explicit locks to protect from data races (like the Java synchronized construct), a programmer could declare a block of code atomic and rely on the underlying software and hardware system to ensure that code within the block executes as if atomic. The transactional memory system might use any of several concurrency control protocols, with some...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017